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Abstract 

Object parsing and segmentation from point clouds 
are challenging tasks because the relevant data is avail- 
able only as thin structures along object boundaries or 
other object features and is corrupted by large amounts 
of noise. One way to handle this kind of data is by em- 
ploying shape models that can accurately follow the ob- 
ject boundaries. Popular models such as Active Shape 
and Active Appearance models lack the necessary flex- 
ibility for this task. While more flexible models such 
as Recursive Compositional Models have been proposed, 
this paper builds on the Active Shape models and makes 
three contributions. First, it presents a flexible, mid- 
entropy, hierarchical generative model of object shape 
and appearance in images. The input data is explained 
by an object parsing layer, which is a deformation of 
a hidden PGA shape model with Gaussian prior. Sec- 
ond, it presents a novel efficient inference algorithm 
that uses a set of informed data-driven proposals to 
initialize local searches for the hidden variables. Third, 
it applies the proposed model and algorithm to object 
parsing from point clouds such as edge detection im- 
ages, obtaining state of the art parsing errors on two 
out of three standard datasets without using any inten- 
sity information. 

1. Introduction 

Object parsing and segmentation are important 
problems with many applications in computer vision 
and medical imaging. While object segmentation is 
only directed towards extracting the object boundary, 
object parsing is aimed at identifying the object parts 
such as head, body, legs, etc. Active Shape and Ac- 
tive Appearance models [6] are popular methods that 
can address both object segmentation and object pars- 
ing. The Active Shape Models (ASM) contain a PCA 
shape model and alternate one step that searches for 
local boundary evidence on the shape normals with an- 
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other step that reprojects the evidence onto the PCA 
plane, until convergence. The ASM method only uses 
partial image information existent on the shape nor- 
mals and its result depends on initialization. In the 
Active Appearance models (AAM), the object appear- 
ance is also modeled by PCA and a trained iterative 
algorithm takes an initial shape towards the solution, 
guided by the image. The AAM uses more image infor- 
mation than the ASM, but the result is still dependent 
on initialization. 

Another limitation of the ASM/ A AM models is the 
lack in accuracy, as a low dimensional PCA shape can- 
not accurately describe the shape variability existent 
in real images, and is limited only to the main de- 
formations. This is illustrated in Figure [l] where a 
10-dimensional PCA shape shown in green cannot ac- 
curately follow the horse boundary and is off by a few 
pixels around the ears, back, legs, etc. 



Figure 1. Motivation for the hierarchical model. A shape 
described by PCA (shown with green dots) is not flexible 
enough to accurately follow the object boundary, but can 
serve as a backbone to limit the variability of the model. 

The approach introduced in this paper brings three 
contributions to object parsing and segmentation. 

First, it presents a hierarchical generative model 
that represents the object shape as a MRF-based defor- 
mation from a PCA backbone, obtaining a higher de- 
gree of accuracy than just the PCA model. If desired, 
sample shapes can be extracted from the generative 
model and can be used for integration or to compute 
marginal statistics. The shape model is completed with 
a data term that connects the image information with 
the current object shape. Due to the high accuracy of 
the shape description, this model can be used for ob- 
ject parsing from point clouds (such as those obtained 
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from edge detection), where the data information is one 
pixel wide. 

A second contribution is the optimization algorithm 
for finding a strong optimum for the hierarchical model. 
The algorithm uses a data-driven set of PC A candi- 
dates to initialize local searches for the deformation 
and PCA parameters. The proposed shape model and 
inference algorithm could in principle be adapted for 
other object parsing and segmentation applications by 
using other data terms, or even by using more accurate 
backbone shape models than the PCA. 

The third contribution is the application of the pro- 
posed approach to parsing horses, cows and faces from 
point clouds. An evaluation on Weizmann horses [4 , 
cows ^ and faces [24j shows that the proposed al- 
gorithm obtains state of the art results on two of the 
datasets without using any intensity information. 

2. Related Work 

One of the most representative works in object pars- 
ing that can be applied to point clouds is the Recursive 
Compositional Models (RCM) [31 . RCM represents 
the object shape in a hierarchical fashion using multi- 
ple levels of rotation-invariant models based on triplets 
of elements. The first level elements are the detected 
image edges, while the elements for each subsequent 
level are summaries of the triplets from the previous 
level. Inference is obtained using a version of dynamic 
programming with pruning. In contrast, our hierarchi- 
cal model represents the shape using a PCA model plus 
MRF deformations along the normals, and has only two 
levels in the hierarchy. Because of the high connectiv- 
ity of our proposed model, exact inference algorithms 
based on dynamic programming are not applicable. In- 
stead, we propose a smart search algorithm that makes 
local searches at a number of locations dictated by a 
bottom- up data-driven process. The advantage of our 
approach is the simplicity of the model, that can be 
easily learned from training examples. Our evaluation 
shows that the errors obtained by our approach are 
similar to the RCM, and that without using any image 
intensity information. 

A robust hierarchical shape model was constructed 
for multi-view car alignment [15 . The model allows 
large deformations of the observed shape points and 
can also handle missing points due to occlusion or 
failures of the part detectors. Similar to our work, 
shape candidates are constructed from partial informa- 
tion obtained from part detectors to initialize a local 
search. However, these candidates have been directly 
generated using RANSAC, as the correspondence be- 
tween the part detections and the model points was 
known. In the object parsing from point clouds, the 



correspondence between input points and the object 
points is not known and the fraction of outliers is usu- 
ally higher than 90%, making RANSAC computation- 
ally prohibitive (needing more than 10^ candidates). 

A related task to object parsing is object segmenta- 
tion. In [31 a separate step based on intensity infor- 
mation is used to obtain the segmentation, while our 
approach obtains the segmentation without using any 
intensity information. 

There is a large amount of work on region-based ob- 
ject segmentation [26l [28l El HI |7l |27] . However, these 
works cannot be directly applied to object segmenta- 
tion from point clouds where the data information is 
sparse and available only at the object boundaries. 

Torresani et al, [26 model the shape as a rigid trans- 
formation plus PCA, without the MRF deformation 
from our formulation. 

The work of Ren et al, [18 is targeted to object 
boundary detection. If does not obtain a clear object 
segmentation or a parsing into object parts. Further- 
more, it uses both edge and gradient information as 
input data. 

Felzenszwalb and Schwartz, [8j use a shape tree as 
a model and focus on shape matching and retrieval, 
without evaluating the parsing error. 

Zhu et al, |^ use a circularity measure to find cy- 
cles with good continuations in edge detection images. 
However, it does not have any shape model so it ad- 
dresses a different problem than ours. 

The Active Skeleton [L uses a skeleton-based shape 
model to detect objects from edge detection images. 
Even though in principle the method could be used 
for object parsing, it has not been evaluated for this 
purpose. 

Interactive Object Segmentation with Graph Cuts 
fT2| imposes a shape prior on a Graph Cut energy. 
However, the shape prior is based on a template with 
similarity transformation without any deformation and 
the Graph Cut energy is on pixels, so no object pars- 
ing or aligned boundary is obtained. This work was 
extended with a Kernel PCA shape prior [17], but still 
depending on manual initialization and obtaining just 
a segmentation without boundary alignment. In con- 
trast, our method obtains object parsing and boundary 
alignment, hence not only the object boundary but also 
the object parts are obtained. 

The knowledge based segmentation [3] uses a shape 
prior based on pairwise cliques between the shape 
points and a primal-dual algorithm for inference. In 
contrast, our framework uses a PCA-based model that 
cannot be decomposed in pairwise cliques and it could 
in principle be extended to work with non-linear shape 
models. 




Figure 2. Our approach starts by tracing the points into chains (left), finds data-driven PC A candidates (middle) that are 
used to initialize local optimizations of the model parameters. The parameters of lowest energy give the parsing result 
(right, black with colored dots) and associated PC A shape (right, green). 



Groups of nearby contour segments are used in 
[TOl [21] to construct features for object detection. 
Our paper uses similar contour fragments to construct 
bottom-up data driven candidates for searching the 
shape space. The features from [lOl [21] could be fur- 
ther used to obtain better discriminative object mod- 
els. Currently we use a generative model, with some 
parameters trained in a discriminative manner. 

The part-based constellation model from [23] uses 
an extension of the contour segment network from [10] 
to construct object parts and a Metropolis-Hastings 
stochastic algorithm for inference. However, the 
method is used for object detection, where the pre- 
cise location of the boundary is not as important as in 
segmentation. 

Another closely related work is the unsupervised 
learning of shape models [H], which uses pairs of ad- 
jacent contours [10] as features and a voting scheme 
to find the object parameters. A separate deforma- 
tion step is then performed using Thin Plate Splines. 
In contrast, the inference algorithm from our work op- 
timizes a single criterion that combines the shape and 
deformation into a single hierarchical model. Moreover, 
our work is aimed towards object parsing, whereas [11] 
is used for object and boundary detection. 

The Active Basis Model [29] can obtain a sketch of 
an object using Gabor filters and has been successfully 
used for object detection. However, it has not been 
used for object segmentation, as it does not return a 
coherent object boundary. 

Our approach is inspired by [9], where an efficient 
version of the Hough transform for line detection is 
obtained by voting at locations given by least squares 
line fitting of clusters of approximately collinear pixels. 

Generating candidates based on partial information 
is similar to the beta channel from [30 , where partially 
occluded faces are detected by combining eye, nose and 
mouth detections. 



3. A Hierarchical Approach to Object 
Parsing 

We propose a hierarchical generative model with two 
levels of hidden variables that need to be inferred from 
the input data. The first level C is the actual object 
parsing while the second level is a PCA shape model 
that limits the degree of variability of the first level. 

The PCA shape is controlled by variables (A,/3) 
consisting of a similarity transformation A and the 
PCA coefficients j3 ^ MP . We abuse the notation 
by denoting A as both the transformation parameters 
A = (u^v^s^O), with rotation 6>, translation (dx^dy) 
and scale 5, and the actual transformation 

A{x^y) = {sxcos6-\-sy sin 6 -\-u^—sx sin 6 -\-sy cos 6 -\-v) 
The PCA shape is 
S{A, p) = A(/i, + P,/3, + Pyp) = {Su SnY (1) 

where ja = {jix-, IJ^y)-, l-^x-, l^^y ^ M.N,i is the mean shape 
and P = {Px,Py),Px,Py ^ Mn,p are the PCA eigen- 
vectors. 




Figure 3. The shape C is obtained as a deformation d = 
(di, ...,dN) of a PCA shape {A, 13) along the normals. 

The representation is illustrated in Figure [3j 
The shape C (black) is obtained from the PCA 
shape (green) using a vector of displacements d = 
(di, ...,dAr) e [-dmax.dmax]^ aloug the normals to 
shape. More exactly, the shape C = C{d) consists 
of a number of line segments (7^,(7^+1 where Ci = 
Si + Uidi^i = 1, A/" and is the normal to the PCA 
shape at Si. 



3.1. The Hierarchical Generative Model 

The model can be represented either as a probabihty 
or an energy. For simphcity, we use an energy formu- 
lation of the model 

E{C,A,/3) = Edata{C)+E,hape{C,A,/3) +Ep{A) (2) 

containing a data term Edata{C) that relates the in- 
put data with the parsing result C, a shape term 

hape{C^A^P) and a prior Epi^A) on the possible 
transformations. 

The data term E^ataiC) is application specific 
and is based on the exact location of the shape 
C = (Ci, Cat). When the input data consists of 
noisy point clouds such as edge detection, the input 
points are traced into point chains based on the 8- 
neighborhood. The data term encourages consecutive 
points Ci-^i to be on the same point chain: 



N-l 



EdataiC) = ^ ip{Ci^CiJ^i) 



(3) 



i=l 



where (^((7^,(7^+1) = —5 if and only if (7^,(7^+1 are on 
the same point chain and ^{Ci^ ^i+i) = otherwise. 



PCA Model 

(A,P) 



Prior Models Ep(A), Ep(p) 



Object Shape 
C-(C,,...,CJ 



Noisy Point Cloud 
(edge detection) 



Eda t a (C) = E ip ( Ci , Ci^ 1 ) 



Figure 4. Diagram of the proposed hierarchical model. 



The shape term 

E,,hape{C, A, p) = E{C\A, p) + Ep{l3) 



(4) 



consists of a deformation term E{C\A, j3) that connects 
the parsed shape with the underlying PCA model and 
a prior Ep{/3) on the PCA coefficients. 

The deformation term is a Gaussian MRF that en- 
courages the curve (or curves) to be parallel and close 
to the PCA shape 

E{C\A, P) = Y, a.d?, + ^^(^^ - ^-i)' (^) 

i i 

and is defined in terms of the displacements di of the 
curve points Ci from the corresponding PCA shape 
points Si. The coefficients ai^ji represent the amount 
of penalty for the deformation at different points along 
the shape. In our applications all ai have the same 
value ai = a and similarly 7i = 7 but this could result 
in a decrease in model accuracy. For example the ai 



for points on the horse head could have smaller values 
because there is more variability for those points. 

The prior Ep{P) on the PCA parameters is a Gaus- 
sian prior based on the PCA eigenvalues 



1=1 



(6) 



The prior Ep{A) for A = (u^v^s^O) forces the scale 
and rotation within a range and discourages transla- 
tions away from the image center (xc^yc)'- 



Ep{A) 



00 if 5 [Smin, Smax] Or |a| > amax 

r\u — Xc\ -\- r\v — yd else 



(7) 



The model parameters O = (a, 7, J, p, r) are learned 
in a supervised manner on the training set through a 



procedure described in Section 3.5 



One advantage of the generative model described 
in eq. Q is that one could easily obtain samples 
from this model, by sampling the PCA coefficients 
P from the Gaussian prior /3 ~ ex.p{—Ep{f3)) and 
the deformation field d from the Gaussian MRF d ~ 

exp(- ^. aid'^ + ji{di - di-if). In Figure ^ 
are shown a few samples from the learned horse mode" 
with the PCA shape S shown in green and the sampled 
shape (7 in black. 




Figure 5. Sample shapes from the shape model Q with the 
parameters of the learned horse model. PCA shape (green) 
and sampled shape (black). 



3.2. Inference Algorithm 

Finding the object parsing (7 and the PCA parame- 
ters (A, P) is a nontrivial optimization problem. How- 
ever, if the PCA parameters (A, /3) are known, the 
parsing (7 is uniquely determined by the displacement 



vector d = (di, ^at), hence C = C{d). In this case 
finding the optimal C{d) is equivalent to finding d that 
minimizes E{C{d)^ A, /3) = ^(d). This can be done ef- 
ficiently by dynamic programming, due to the additive 
nature of the model when the PCA shape (A, (3) is fixed 

N-l 

i = l i i 

If the parsing C is fixed, an approximate minimum of 
E{C, /3) can be obtained by least square fitting of 
the PCA shape parameters (A, 

Therefore, if the PCA shape parameters (A, (3) are 
initialized close to their optimal values, an approach 
that alternates the above two steps, namely the com- 
putation of the parsing C by dynamic programming 
and the estimation of the PCA parameters (A, (3) by 
least squares, will converge to an approximate local op- 
timum of E{C, A, P) in a few iterations. This approach 
is similar in spirit to the Active Shape Model, with the 
difference that a consistent low energy deformation C 
is found by optimization in our method instead of find- 
ing data evidence on each normal independently as the 
ASM does. 

We will use a data-driven approach described in 
Section |3.3| to find a number of candidate shapes 
{Ai^l3i)^i = 1, A/'^'^^^ for initialization of the local 
optimum search described above. The final solution is 
obtained as the lowest energy configuration (C, A, /3) 
among the N'^^'^^ local optima obtained. The whole 
optimization algorithm is described in Algorithm [l] be- 
low. 

Algorithm 1 Optimization Algorithm 

Input: Noisy point cloud e.g. edge detection image, 
PCA candidates = 1, A^^^^^. 

Output: Near-optimal hidden variables (C, A, 
for i = l to A^^^^^ do 
for jf = 1 to Niter do 

Find displacement vector d using dynamic pro- 
gramming 

d = argmin^rf,,,(C(d)) +^(d|A„ft). (8) 

d 

Refit {Ai^Pi) by least squares on C{d) 
end for 

Obtain a = C(d). 
end for 

Find j = argmin^ E{Ci^ Ai^ j3i) 
Obtain = {Cj.Aj.pj) 

As the model energy ([2| is just an approximation of 
the true object shape model, it is possible that other 
ways to combine the candidates such as weighted av- 
eraging [22] might be better than choosing the lowest 
energy one. This is subject to further investigation. 



3.3. PCA Candidate Generation 

The PCA shape candidates are obtained by match- 
ing one or more contour fragments to parts of the PCA 
model. The contour fragments are similar to [TOl [21] 
and are obtained in a preprocessing step described in 
Section 13.41 below. 

An initial set of PCA candidates can obtained from 
one contour fragment, as described in Section [3.3. 1[ If 



more accuracy is desired, these initial candidates can 
be refined by matching other contour fragments near 
the candidates to other parts of the PCA model, as 
described in Section [3.3.21 

3.3.1 Candidate Generation from One Con- 
tour Fragment 

These PCA candidates are obtained by matching a con- 
tour fragment to different parts of the PCA model. To 
speed-up computation, an interval [L(l)^U(l)] for the 
number of PCA points that match contour fragments 
of length / (made integer) is obtained from the training 
set and the ground truth annotations. 

The method for obtaining the PCA candidates from 
one contour fragment is described in Algorithm [2j 

Algorithm 2 CG1(A^^^^^) 

Input: Contour fragments c of length len{c) G 

Output: At most N^^'^^ different PCA shape can- 
didates {Ai,/3i) with matches {ci^bi^ki). 
for any contour fragment c do 

for any k with L{1) < k <U{1) where / = len{c) 

do 

Subsample c evenly to have k points pi, 
for l<b<N do 

Fit points 6, 6 + A: - 1 of PCA shape (A, (3) 
to pi, in a least square sense. 
Discard (A, (3) if the matching error is above 
a threshold, 
end for 
end for 
end for 

Perform Non-Max Suppression to keep at most 
j^cand candidates. 

The Weighted PCA [19], described in the Appendix, 
is used to fit in a least square sense a given subset of a 
PCA shape to a number of points Pi, ...,p/c. 

The non-maximal suppression step finds the candi- 
date of smallest fitting error and removes all candidates 
at average point-to-point distance at most from 
it, then adds the remaining candidate of smallest error, 
and so on. 



For each obtained PC A shape candidate (A^, = 
1, A/'f^^^ we also remember the contour fragment q 
and match location 6^, ki that were used to generate it. 

In Figure [6j left is shown the closest candidate to the 
ground truth among N^^'^^ = 200 candidates obtained 
by Algorithm [2] 



Figure 6. The best candidate obtained from one (left) and 
two (right) contour fragments. The fragments that gener- 
ated each candidate are shown in black. 



3.3.2 Candidate Generation from Two Con- 
tour Fragments 

Usually images contain more than one contour frag- 
ment of the object to be segmented. We can refine 
a candidate obtained by CGI by fitting it simultane- 
ously to the contour fragment it was obtained from and 
to another fragment close to the shape. Experiments 
in Section [4] show that this strategy can improve the 
quality of the candidates and of the final result. The 
details of this strategy are given in Algorithm [3j 

Algorithm 3 CG2(Ar^^^^) 

Input: PC A shape candidates (A^, pi) with matches 
{ci^bi^ki) from CGI and contour fragments c. 
Output: At most N^^^^ different PC A shape can- 
didates {Ai^pi). 
for i = l to TVf do 

Set (Pi,...,P,v)' = ^(A„ft) fromEq. 0. 
for any contour fragment c do 

Find Pj^ Pk^l < j^k < N closest to the begin- 
ning and end of c 

if d{c,Pj) ^d{c,Pk) < 2d^«^ and [j,k] does 
not overlap with [6^, hi ^ ki — 1] then 

Subsample c to have m = k — j -\- 1 points 

Find PCA shape (A, P) that fits points 
bi, ...^bi-\-ki — l through q and ...^k through 
Pi, ••'iPm in a least square sense. 
Discard (A, (3) if the matching error is above 
a threshold, 
end if 
end for 
end for 

Perform Non-Max Suppression to keep at most 
j^cand candidates. 



3.4. Preprocessing 

Preprocessing begins with tracing the input points 
into point chains based on the 8-neighborhood. The 
point chains are then subsampled every 5-6 pixels to 
reduce the number of contour fragments obtained. 



^^^^^) \^^^^ ^^yd^ISi^^ 









Figure 7. Left: the input points are traced into point chains 
and subsampled every 5-6 pixels. Right: smooth contour 
fragments (black) are fitted through the point chains start- 
ing and ending in the subsampled pixels. 



The contour fragments used by the candidate gener- 
ators are represented as a polynomials of degree three 
relative to a system of coordinates aligned with the 
contour's endpoints, as illustrated in Figure |8] The 
contour fragment endpoints are two of the subsampled 
points of the same traced point chain and the polyno- 
mial is fitted in a least square sense through all the 
chain points in between. The fragments are restricted 
in length to a range [Imin^lmax]' Only the fragments 
with a maximum error at most emax = 1-5 are kept. 
Non-maximal contour fragments (based on the sets of 
chain points they were constructed from) are removed. 
An example of obtained contour fragments is shown in 
Figure [7| right. 




Figure 8. A contour fragment (red dashed) is a polynomial 
fit of a subset of a chain of points (shown in gray). 

3.5. Learning the Model and Algorithm Parameters 

The proposed model is very simple. It consists of a 
PCA model with at most 10 principal directions plus a 
small number (<20) of parameters. Because the model 
is small, it should be expected that it generalize well 
to unseen data if the training data is representative. 

The PCA model is learned in the standard way using 
Procrustes analysis to align the training shapes. 

For the rest of parameters that need to be learned we 
adopt a supervised approach employed in other MRF- 
based methods [2| |16l [20] , namely learning the param- 
eters by optimizing a loss function on the training set. 
To speed-up the parameter learning, for candidate gen- 



erators we employ loss functions that directly evaluate 
the generated candidates instead of the final result. 

The parameters of the candidate generators are 
learned first, in the order CGI and CG2, using the 
minimum of the average point-to-point distances from 
the candidates to the ground truth annotation (de- 
scribed in Section |4| as loss function. This speeds-up 
the learning process since the CG parameters are this 
way decoupled from the later modules. Other mea- 
sures, such as detection rate/false positive rate for the 
contour fragments, could be used instead and are sub- 
ject to further investigation. The number of PGA com- 
ponents were fixed to p = 4,8 for CGI respectively 
CG2 except for the faces where they were p = 2, 4 for 
CGI respectively CG2. 

We adopted a coordinate descent optimization 
(where one parameter is optimized at a time) and 
picked values that balance speed and accuracy. The 
obtained parameters for CGI are Imin = 20, Imax = 
60, ATf^^^ = 200, L)^^^^ = 5, and 7V^^^^ = 400, 1)^^^ = 
g^^max ^ 20 for CG2. 

In Figure [9] are shown the average errors of the clos- 
est candidate obtained by CGl-2 on the training and 
test sets vs the number of candidates N^^^^. 




Figure 9. Candidate generator error vs. number of candi- 
dates for the horse (left) and cow (right) datasets. 

Figure |9] shows that the test error decreases as the 
the training error decreases, which means that there is 
minimal overfitting for the candidate generators. Ob- 
serve that CG2 obtains better candidates than CGI 
(closer to the ground truth), especially for the cow im- 
ages. 

Similar behaviors were observed for the D'^'^^^ jjnms 
parameters and for the dJ^^^ parameter of CG2. 

The model parameters O = (a, 7, J, p, r) are learned 
based on the average point-to-point distance between 
the obtained parsing results and the ground truth an- 
notation. 



1 

Err{Q) = -^6^(6) 



(9) 



where erri{0) is the average point-to-point error of the 
parsing result obtained with parameters © on example 
i using CGI. 



The model parameters S^p^ai = a^r^p were ob- 
tained by optimizing the error measure ([9| on the 
training set by coordinate descent, with parameters 
ji = 0.1^ Niter = 10 fixed. The obtained values are 
given m Table [1] 

Table 1. Learned parameters for Algorithm [l] 



Dataset 


ai 


5 


P 


r 


P 


Weizmann horses 4 


0.04 


2 


2 


1 


10 


Cows 13j 


0.04 


2 


2 


0.5 


10 


IMM Faces [21 125] 


0.04 


7 


2 


0.5 


6 





Delta Delta 

Figure 10. The parsing error measure (|9| vs ^ for the horses 
(left) and cows (right). 

The dependence of the error on the values of 6 and 



p are shown in Figure 10 Again, the test errors follow 
the training error. 

4. Experimental Results 

We evaluated this approach on three datasets: the 
Weizmann dataset [4] containing 328 horse images 
with object segmentations as binary masks, the Cows 
dataset \i3\ with 111 cow images and the IMM face 
dataset from [24l |25j. We used the same subsets of 
images as [31] for training and testing the Weizmann 
dataset and the first 25 images for training the Cows 
dataset and tested on all 111 images. 

Each horse and cow were manually annotated with 
14 control points on the boundary, as illustrated in 
Figure [TT] left. For fairness, the same horse and cow 
legs were annotated as in [31 . Smooth curves were ob- 
tained between the control points by dynamic program- 
ming to minimize the average distance to the object 
boundary from the binary mask. Intermediate points 
were obtained by dividing the smooth curves into equal 
parts. The obtained annotation is shown in Figure [TT] 
right, with 96-points for the horses and 87 points for 
the cows. 

We also evaluated a standard Active Shape Model 
[6 initialized in the center of the image with average 
scale, no rotation ^ = and 20 update iterations. 

The results are summarized in Tables [2|3] and [4] In 
Fig. 12 are plotted the sorted errors on the datasets. 



Table 2. Performance of different methods on the Weizmann Horse dataset. 



Method 


Train 


Test 


Contour 


Train 


Test 


Time/img 




images 


images 


points 


error 


error 


(sec) 


Active Shape Model 6 


50 


227 


96 


25.35 


29.05 


<1 


Recursive Compositional Models [31 


50 


227 


27 




16.04 


23 


Ours, with CGI 


50 


227 


96 


12.79 


15.58 


44 


Ours, with CG2 


50 


227 


96 


12.74 


15.41 


69 


Ours, with CG2, no head or legs 


50 


227 


60 


8.21 


11.42 


20 



Table 3. Performance of different methods on the Cows dataset. 



Method 


Train 


Test 


Contour 


Train 


Test 


Time/img 




images 


images 


points 


error 


error 


(sec) 


Active Shape Model 6 


25 


111 


87 


48.81 


49.23 


<1 


Recursive Compositional Models [31 


1 


111 


27 




15.8 


3.5 
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Figure 11. Left: each image is manually annotated with 
14 control points and smooth curves (yellow) are obtained 
between the control points using the ground truth segmen- 
tation. Right: The obtained boundary annotation with 96 
points (horses) and 87 points (cows). 



from which different error percentiles can be obtained. 
The Recursive Compositional Model [31 also reports 
average point-to-point errors on the Weizmann and cow 
datasets but uses both edge and intensity information, 
unlike our approach, which only uses edge information. 

We are not aware on any fully automatic face align- 
ment results on the IMM face dataset |24]. The re- 
sults published in [24| and other publications referring 
to this dataset initialize their algorithms at locations 
close to the true location and report the error after 
convergence. 

The PGA model has difficulties modeling the shape 
variability of the horse head and legs. If the head and 
leg points are removed, the training and test errors de- 
crease substantially, as it can be seen in the last row 
of Table [2j This experiment suggests that other shape 



models with free parameters for the head and leg po- 
sitions might be more appropriate than PGA for the 
higher level model. Such models are subject to further 
investigation. 

Parsing examples using CG2, are shown in Figures 
13|14|15| and[T6 



5. Conclusion and Future Work 

This paper presented a method for object parsing 
from noisy point clouds such as edge detection results. 
The object shape is modeled as a MRF deformation 
of a hidden PGA model. The model parameters are 
inferred by an algorithm that searches for the energy 
minimum through many local searches starting from 
a number of data-driven initializations. The experi- 
mental results show that our method is competitive 
with modern approaches for object parsing from point 
clouds such as the Recursive Gompositional Models [31j 
and Active Shape Models 0. 

The candidate generators and the parsing module 
can be easily parallelized, expecting a 10-100 times 
speedup from a GPU implementation. 

In the future, we plan to investigate more accurate 
models for the higher level, with free parameters for 
the head and leg positions. We also plan to extend 
the method to 3D object parsing using approximate 
inference methods such as Graph Guts or Belief Prop- 
agation for the boundary matching. 
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Appendix: Weighted PCA 

A partial PCA model can be fit to a number of 
points using the weighted least squares method [19 , 
summarized in Algorithm [4j The weights of missing 
PCA points are set to zero. The weighted alignment 
between the shapes , ^2 has been described in Ap- 
pendix C from [6 . 



Algorithm 4 FitWeightedPCA 



Input: Shape Si = (xi,yi), weight vector w = 
(^i,...,^Ar)', ||w||i T.f=iWi = 1. 
Output: Weighted least-square fit parameters 
Set W — diag{wi^ wn) 

Set Kx = {P'xW^Px)-^P'xW^,Ky = {Pl^W^Py)-^P[^W^ 
Set /3 = 

for z = 1 to Nit do 

Set S2 - (X2, y2), X2 = /ix + PxP, y2 ^ IJiy + Py/3 

Solve 
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with ^ ^ y/^^^ = ||w|| 1 = 1 

Sxx - x.[W:sii,Syy - yiVl/^yi 

Si=^[W^2 + yiWy2 

S2 - yiH^X2 - :K.[Wy2 
Obtain A(x, y) = (ax -\- by -\- dx^ — 6x + ay + dy) 
Find i^o.yo) = A-\Si) 
Set f3 = Kx{yio - /J^x) + Ky{yo - iiy) 
end for 

Set s = + 6^, ^ = arctan(6/a). 
Obtain A = {dx, dy, s, 0) 




Figure 13. Results on test set. Left: Traced points. Right: 
parsing result (black) with associated PCA shape (green). 




Figure 14. More results on test set. 




Figure 15. Results on the cow dataset. 



Figure 16. More results on the cow dataset. 



